Cross-Language Information Retrieval for Technical Documents
نویسندگان
چکیده
This paper proposes a Japanese/English crosslanguage information retrieval (CLIR) system targeting technical documents. Our system rst translates a given query containing technical terms into the target language, and then retrieves documents relevant to the translated query. The translation of technical terms is still problematic in that technical terms are often compound words, and thus new terms can be progressively created simply by combining existing base words. In addition, Japanese often represents loanwords based on its phonogram. Consequently, existing dictionaries nd it di cult to achieve su cient coverage. To counter the rst problem, we use a compound word translation method, which uses a bilingual dictionary for base words and collocational statistics to resolve translation ambiguity. For the second problem, we propose a transliteration method, which identi es phonetic equivalents in the target language. We also show the e ectiveness of our system using a test collection for CLIR.
منابع مشابه
Applying Machine Translation to Two-Stage Cross-Language Information Retrieval
Cross-language information retrieval (CLIR), where queries and documents are in di erent languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an e ective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this pr...
متن کاملJapanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our sy...
متن کاملExtraction of Training Sets for Experimentation with Cross Language Information Retrieval Systems
In this paper we focus on methods, models and tools for the extraction of bilingual training / test sets useful for the (semi) automatic classification of textual documents. Such documents could be tutorials, technical specifications, articles, personal notes, etc. Another motivation for our research is the need for managing corpus of classified texts and especially parallel corpora (texts). We...
متن کاملApplying Machine Translation to Two-Stage Cross-Language Information
Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this ...
متن کاملAn Approach to Cross-Age and Cross-Cultural Information Access for Digital Humanities
1. Introduction Since libraries have collection of documents across age and culture, and even language, the libraries are inherently multi-age, multi-cultural, and multilingual. In the digital age, more and more historical documents are being digitized to preserve contents written in deteriorating papers. Library, etc.). It means that more and more old text contents will be accessible on the in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cs.CL/9907007 شماره
صفحات -
تاریخ انتشار 1999